Forward vs. Inverse Methods for Using Near-Real-Time Streamflow Observation Data in Long Short-Term Memory Networks

Presentation at the AGU 2021 Fall Meeting presenting approaches to improve LSTM streamflow predictions with near-real-time observation data.

Abstract

Ingesting near-real-time observation data is a critical component of many operational hydrological forecasting systems. There are generally two classes of strategies for ingesting streamflow observations into rainfall-runoff models: autoregression (AR) is a core component of many statistical hydrology models, and data assimilation (DA) is used in conceptual and process-based models.

Long Short Term Memory networks (LSTMs) are currently the most accurate and extrapolatable streamflow models available from the hydrological science community. LSTMs are frequently used for autoregression, and like dynamical systems models, LSTMs have an explicit state space, which means that they can be used with DA. AR is more straightforward to implement than DA (which necessarily requires some type of inverse method), however DA has the advantage of being less sensitive to missing data. Missing data is a problem for operational forecasting, especially in developing countries, where streamflow monitoring infrastructure can be unreliable – as an example Google’s flood forecasting model regularly encounters 10-20% missing streamflow data in a given monsoon season in India.

In this project we compare accuracy and robustness of variational DA and AR for streamflow forecasting with LSTMs. Variational DA is implemented using the same tensor network as the model, meaning that it is accomplished via backpropagation in a way that requires only minimal additional computational infrastructure beyond the LSTM model itself. AR is implemented in a way that is robust to missing data by using a binary input flag that signals when a streamflow observation is missing. Both models can be used in an operational setting.

Models were tested on 10 years worth of daily streamflow observations from 531 basins in the continental US using the NCAR CAMELS dataset. The AR approach is (i) more accurate (ii) more computationally efficient, and (iii) easier to implement than DA. Care must be taken to train AR models in a way that is robust to missing data, and we describe the tradeoffs inherent in this AR training procedure in detail.

Link

Citation

@inproceedings{nearing2022agu,
  title={Forward vs. Inverse Methods for Using Near-Real-Time Streamflow Observation Data in Long Short-Term Memory Networks},
  author={Nearing, Grey Stephen and Kratzert, Frederik and Klotz, Daniel and Gauch, Martin and Frame, Jonathan and Nevo, Sella},
  booktitle={AGU Fall Meeting 2021},
  venue={New Orleans, LA},
  year={2021},
  organization={AGU}
}